Sure independence screening for ultrahigh dimensional feature space
نویسنده
چکیده
High dimensionality is a growing feature in many areas of contemporary statistics. Variable selection is fundamental to high-dimensional statistical modeling. For problems of large or huge scale pn, computational cost and estimation accuracy are always two top concerns. In a seminal paper, Candes and Tao (2007) propose a minimum l1 estimator, the Dantzig selector, and show that it mimics the ideal risk within a logarithmic factor log pn. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high: the factor log dn can be large and their uniform uncertainty condition can fail. Motivated by these concerns, in this paper we introduce the concept of sure screening and propose a fast and straightforward method via iteratively thresholded ridge regression, called Sure Independence Screening (SIS), to reduce high dimensionality to a relatively large scale pn, say below sample size. An appealing special case of SIS is the componentwise regression. In a fairly general asymptotic framework, SIS is shown to possess the sure screening property for even exponentially growing dimensionality. With ultra-high dimensionality reduced accurately to below sample size, variable selection becomes much easier and can be accomplished by some refined lower-dimensional methods that have oracle properties. Depending on the scale of dn, one can use, for example, the Dantzig selector or Lasso, the fine method of SCAD-penalized least squares in Fan and Li (2001), or the adaptive Lasso in Zou (2006). This talk is based on collaborated work with Professor Jianqing Fan.
منابع مشابه
Ultrahigh Dimensional Feature Screening via RKHS Embeddings
Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dimensional regime. But in the ultrahigh dimensional regime, these approaches suffer from several pro...
متن کاملFeature Screening via Distance Correlation Learning.
This paper is concerned with screening features in ultrahigh dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS, for short). The DC-SIS can be implemented as easily as the sure independence screening procedure based on the Pearson correlation (SIS, for short...
متن کاملUltrahigh-Dimensional Multiclass Linear Discriminant Analysis by Pairwise Sure Independence Screening.
This paper is concerned with the problem of feature screening for multi-class linear discriminant analysis under ultrahigh dimensional setting. We allow the number of classes to be relatively large. As a result, the total number of relevant features is larger than usual. This makes the related classification problem much more challenging than the conventional one, where the number of classes is...
متن کاملFeature Screening in Ultrahigh Dimensional Cox's Model.
Survival data with ultrahigh dimensional covariates such as genetic markers have been collected in medical studies and other fields. In this work, we propose a feature screening procedure for the Cox model with ultrahigh dimensional covariates. The proposed procedure is distinguished from the existing sure independence screening (SIS) procedures (Fan, Feng and Wu, 2010, Zhao and Li, 2012) in th...
متن کاملRejoinder: Sure independence screening for ultrahigh dimensional feature space
We are very grateful to all contributors for their stimulating comments and questions on the role of variable screening and selection on high-dimensional statistical modeling. This paper would not have been in the current form without the benefits of private communications with Professors Peter Bickel, Peter Bühlmann, Eitan Greenshtein, Qiwei Yao, Cun-Hui Zhang and Wenyang Zhang at various stag...
متن کامل